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—_——————— achieved significant results in image processing. In this study, we use supervised and deep 


learning classifiers to detect and classify tumors using the MRI images from the BRATS 


classification by human inspection is a time consuming, error-prone task involving huge 
amounts of data. Computer-assisted machine learning and image analysis techniques have 
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2020 dataset. At the outset, the proposed system classifies images as healthy or normal 
brains and brain having tumorous growth. We employ four supervised machine learning 
classifiers SVM, Decision tree, Naive Bayes and Linear Regression, for the binary 
classification. Highest accuracy (96%) was achieved with SVM and DT, with SVM giving 
a better Recall rate of 98%. Thereafter, categorization of the tumor as Pituitary adenoma, 


classification, Machine 


learning, Support vector 
Meningioma, or Glioma, is performed using supervised (SVM, DT) classifiers and a 6-layer 


aces Convolution Neural Network. CNN performs better than the other classifiers, with a 93% 
accuracy and 92% recall rate. The suggested system is employable as a powerful decision- 
support tool to assist radiologists and oncologists in clinical diagnosis without requiring 
invasive procedures like a biopsy. 

Introduction treatment. Any disease can be cured, and patients have a 

Medical image processing makes use of various types higher chance of surviving with early and correct 

of scans such as CT (Computer Tomography), detection. 


Ultrasound, PET (Positron Emission Tomography), MRI 
(Magnetic Resonance Imaging), Spectroscopy, etc. 
Among these, MRI is most widely used for diagnosis as it 
is sensitive and powerful while also being noninvasive 
(Badza et al., 2020; Khan et al., 2020). MRI scans 
provide detailed information as they use effective radio 
waves and magnetic fields are used to create pictures of 
the inside organs, effectively detecting cysts, tumors, 
swelling or bleeding of organs. Analysis and 
classification of these scans lead to the identification of 
any irregular growth. Early detection of abnormal tissue 
growth is one of the main issues in medical image 
processing. Precise estimation of the abnormal tissue 
growth aids in a better prognosis and post-operative 


The fundamental unit of the human body is a cell. 
Tumor formation is caused by the body's cells growing 
irregularly or abnormally. These tumorous regions may 
have different shapes and Different image 
intensities in the scan capture these regions. Figure 1 


Sizes. 


shows an MRI scan of a normal brain and a brain with 
tumorous growth. A tumor can be benign or malignant. 
The differentiating feature among them is their structure. 
While benign tumors have a uniform homogenous 
structure, malignant or cancerous tumors form 


heterogenous structures. Benign tumors are non- 


cancerous and can be surgically removed, as they seldom 
grow back. Malignant tumors, however, contain cancer 
cells and are a cause of much concern. These cells tend to 
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invade healthy cells in their proximity as well. Also, a 
benign tumor can later become cancerous. A low-grade 
tumor can metamorphose into a higher-grade tumor. 
Therefore, timely detection and diagnosis of the exact 
stage and grade of tumors are crucial for proper 
treatment. 


(a) Normal brain 
(sagittal view) 


(b) Pituitary adenoma 
(coronal view) 


(c) Glioma 
(sagittal view) 


(d) Meningioma 
(axial view) 


Figure 1. MRI scans of the normal brain and brain 
have different tumorous growth types. 


The brain is a complex organ of the human body that 
regulates the biological mechanisms and _ individual 
characteristics of the body. Brain tumors occur when the 
brain cells start multiplying abnormally. Conventionally, 
image classification and tumor detection are done by 
human inspection (Hashmi and Osman, 2022). This is a 
complex, error-prone and time-consuming task, due to 
the huge amounts of data involved. The results depend on 
the expertise of the radiologists and are non-reproducible. 
Researchers have worked diligently to find the most 
accurate method for classifying tumors from MRI scans. 

Advancements in computer-aided image processing 
and the fields of cancer and biomedicine have benefited 
greatly from the use of machine-learning techniques. 
They aid radiologists, and oncologists in improving 
overall surgical and diagnostic accuracy, help in proper 
prognosis, dose estimation and preparing a treatment plan 
for the patient. Image processing techniques help detect 
the tumour prematurely, limiting the need for a biopsy. 
Image classification is used to diminish the gap between 
computer vision and human vision. The main challenge is 
accurately detecting and classifying tumours from the 
MRI scan (Irmak, 2021). Timely detection of any disease 
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helps in better treatment, saving patients’ life. For this 
reason, the detection and classification of Brain tumor are 
of great importance and has been extensively researched. 
This paper uses a Support Vector Machine, Decision 
Tree, Naive Bayes and Linear Regression to classify 
images as normal or tumorous. Thereafter, the tumorous 
(ai) 
Meningioma, or (iii) Glioma using SVM, DT and a 6- 
layer Convolution Neural Network on the BRATS 2020 
dataset. We compare the performance of all the classifiers 
used based on the accuracy, precision, recall and F1 score 


images are further classified as (i) Pituitary, 


metrics. Section 2 of the paper sheds light on the work 
presented in extant literature in the area of brain tumor 
classification. Section 3 discusses the methodology 
adopted in this work for preprocessing images and their 
classification. In Section 4, we discuss the results 
obtained from our experiments. Confusion matrices and 
other evaluation measures including accuracy, precision, 
recall, and Fl score are used to present the results. In 
Section 5, we wrap up our analysis. 


Relevant Work 


Research has been done on _ classifying and 
segmenting brain tumors from MRI images extensively 
using supervised and unsupervised techniques. Zacharaki 
et al. (2009) apply ranking-based feature selection on 
Region of Interest (ROI) and classify using support 
vector machine recursive feature elimination (SVMRFE). 
Their proposed method achieved 98.2 % accuracy for 
GL2-GL4 (glioma grade II and IV). Gonzalez-Navarro et 
al. (2010) use Magnetic Resonance Spectrograph (MRS) 
images for classification using LOO bootstrap and Naive 


Bayes, Logistic Regression, Discriminant, 


Linear 
Quadratic Discriminant, SVM with Linear and quadratic 
Kernel (SVM-L and SVM-2) and SVM Radial (SVM-R). 
Feature selection using Entropic filtering improved 
classifier performance. They could achieve better results 
for Short Echo Time data, with SVM-R giving the best 
accuracy of 55.5, 88,2. 87.2 (LET, SET, LSET data). 
Naik et al. (2014) performed a comparison of Naive 
Bayes and Decision Tree classifiers to classify CT-Scan 
brain images into normal, benign and malignant. They 
used Median Filtering with 3x3 median filter for de- 
Opening, Power 
enhancement. 


law 
Their 
experiments conclude that Decision Tree gave better 


noising, and Morphological 
Transformation for Image 
accuracy of 96% compared to NB. Kumar et al. (2017) 
also compared K-NN, SVM and Decision Trees for 
classifying brain MRI scans. RGB preprocessed the 
images to gray scale conversion, Gaussian filtering for 
denoising. The dataset was trained on a neural network, 
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segmented using morphology and clustering, and 
classification using K-NN, SVM and Decision Trees. 
Their results affirm better performance by the SVM 
classifier. In another study, Kumar et al. (2017) worked 
on SICAS Medical Image Repository. They apply DWT 
for feature extraction, PCA for feature selection, and 
SVM for classification achieving linear accuracy varying 
from 80%-90%. Abd-Ellah et al. (2016) experimented on 
data from Harvard Medical School, MICCAI 2014 
Machine Learning Challenge (MLC). The MRI images 
were preprocessed for noise removal, features extracted 
by DWT, dimensionality PCA, 
classification by kernel support vector machine (KSVM). 
They could achieve maximum classification accuracy of 


reduction by and 


100% using Gaussian radial basis function (GRB) kernel 
with a default scaling factor. Alfonse et al. (2016) make 
use of Expectation Maximization (EM) for segmentation, 
Fast Fourier Transform (FFT) for feature extraction, 
Minimal-Redundancy-Maximal-Relevance criterion 
(MRMR) for feature selection, and finally, SVM for 
classification. They could achieve an accuracy of 98.9% 
with their proposed model. Deepa et al. (2011) Present a 
survey of ML techniques used till 2011 for medical 
image classification and segmentation. Their survey 
noted that SVM and ANN for classification, FCM and K- 
means for segmentation, and GA and PSO for feature 
extraction are used effectively. 

Havaei et al. (2016) used the BRATS2013 and 
compared k-nearest neighbor classifier (KNN), support 
vector machines (SVM), random forests and boosted 
decision trees. They claim that SVMs gave superior 
results. Chavan et al. (2015) used the WHO (World 
Health Organization) data from WBA (World Brain 
Atlas) Website. They studied the performance of K-NN 
classifier. After denoising the images using a Gaussian 
filter, contrast enhancement using Histogram 
Equalization, segmentation using Thresholding and 
feature extraction by GLCM (gray-level cooccurrence 
matrix), K-NN classifier gave 96.15% classification 
accuracy. Ain et al. (2014) employ an ensemble-based 
SVM classifier using weighted majority voting to 
combine results. They applied Fast Discrete Curvelet 
Transform for denoising, histogram-based and co- 
occurrence matrix-based textural feature extraction, and 
FCM for segmentation. Their ensembled classifier gave 
higher accuracy as compared to SVM and ANN. 
Keerthana (2018) propose a SVM-GA model wherein 
they employ a Median filter for noise removal, Feature 
extraction GLCM (Gray Level Co-occurrence Matrix), 
and SVM RBF (Radial Basis Function) for classification. 
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A genetic algorithm optimizes extracted features and 
SVM parameters to improve classification performance. 
Researchers have also employed many unsupervised 
techniques. Logeswari et al. (2010) propose a model that 
uses Weighted Median (WM) filters for noise reduction 
and HSOM (Hierarchical Self Organizing Map) for 
image segmentation. Akil et al. (2020) propose a fully 
CNN architecture with Overlapping Patches inspired by 
the Occipito-Temporal Pathway (OTP) for the 
segmentation of high- and low-grade Glioblastomas on 
the BRATS 2018 dataset. They applied the class- 
weighting technique to segmentation results to overcome 
the unbalanced data problem. They concluded that CNN 
with Overlapping Patches provided good segmentation 
results compared to adjacent patches (Basheera and Ram, 
2019; Mohan et al., 2022). Khan et al. (2021) apply k- 
means clustering for segmenting tumorous areas in the 
BRATS 2015 dataset. The preprocessed image is 
classified into two categories, benign or malignant, by a 
CNN model using VGG19 (Visual Geometric Graph). 
They also use synthetic data augmentation to increase the 
available data size. Ari and Hanbay (2018) propose an 
ELM-LRF (Extreme Learning Machine local receptive 
fields) framework for classification. The model selects 
random weights for convolution and pooling in the input 
layer, the least square method for calculating weights 
between the hidden layer and output layer. Tumor 
detection is done by Watershed segmentation. They 
compared the performance of their model with the Gabor 
wavelets-based method, statistical features-based method, 
and 6-layer CNN. They could achieve 97.18% accuracy 
on the BRATS 2013 dataset with their proposed method, 
which outperformed the other methods. Krizhevsky et al. 
(2012) also employ CNN, having five convolutional and 
three fully-connected layers for classification. They used 
data augmentation to reduce overfitting. They could 
achieve top-1 and top-5 test set error rates of 37.5% and 
17.0%, which outperforms state-of-art techniques. Zhang 
et al. (2001) developed a Hidden Markov Random Field 
(HMRF) model combined with an 
maximization (EM) algorithm for fitting model 
parameters to segment normal brain images into three 
tissue categories - Gray Matter (GM), White Matter 
(WM) and Cerebral Spinal Fluid CSF. Abdel-Maksoud 
(2015) employs K-means clustering integrated with 


expectation- 


Fuzzy C means for brain tumor segmentation. 


Materials and methods 
Dataset and preprocessing 

The image dataset used for experimentation in this 
study is the BraTS 2020 dataset available on Kaggle. The 
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dataset contains around 2870 MRI scans in the Digital 
Imaging and Communications in Medicine (DICOM) 
standard format, with 395 images of the healthy or 
normal brain without tumorous growth, 827 images of 
pituitary tumor, 822 images of meningioma, and 826 of 


Preprocessing 
Feature 
Selection 


Image Enhancement, 
Cropping, Resizing 


and Linear Regression, are used for this purpose. We 
found the best accuracy (96%) with SVM-RBF and 
Decision Tree classifiers, with SVM having an edge with 
a better Recall rate of 98%. Hence, we use these two 
supervised learning classifiers and a 6-layer CNN to 


—_ Model —_> Model 
Training Validation 


Classification Results 


Binary Multi-class 
(SVM, DT, LR, NB) (SVM, DT, CNN) 


i. Normal Brain 

ii. Pituitary Tumor 
iii, Meningioma 
iv. Glioma 


Evaluate Performance 
(Accuracy, Precision, Recall, F-Score) 
Output Results 


i. Normal Brain 
ii, Tumouros Brain 


Figure 2. Overview of proposed methodology. 


glioma. 

One of the most important tasks for accurate tumor 
detection and classification is enhancing the image 
through preprocessing techniques. The images were 
preprocessed and cropped for image enhancement to aid 
in the accurate detection and classification of tumorous 
regions. Each input image was normalized and resized 
from 256 x 256 to 128 x 128 and rescaled by dividing 
each pixel by 255. Selection of effective features plays a 
key role in classification performance. An optimum 
feature set helps in achieving good classifier accuracy. 
We use PCA (Principal Component Analysis) for 
selecting optimum features and dimensionality reduction. 
We conducted classification experiments with and 
without feature selection and found that the accuracy of 
classifiers increased with PCA with 1100 features. 


Overview of proposed methodology 

The preprocessed dataset is split into training and 
validation datasets. Binary classification involves 
classifying images as Normal or no tumor brain and 
Four 


Brain containing tumor. supervised machine 


learning classifiers-SVM, Decision tree, Naive Bayes, 
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further classify images into four categories — (i) No 
tumor, (ii) Pituitary tumor, (iii) Meningioma, (iv) 
Glioma. On comparing the performance of the classifiers 
we found that CNN 
outperforms the other two classifiers with an accuracy of 
93% and a Recall rate of 92%. An overview of the 
proposed methodology is presented in Figure 2. 


based on our experiments, 


Binary classification 

We employ four supervised machine learning 
classifiers to categorize preprocessed MRI images as 
normal or tumorous. Binary classification (class 0 — No 
tumor, 1- Brain containing tumor) is performed using 
SVM with RBF (Radial Basis Function) kernel, Decision 
tree, Naive Bayes and Linear Regression, with SVM 
yielding highest accuracy of 96% and Recall rate of 98%. 
In medical diagnosis, it is very important to see that the 
false negative rate is as low as possible, as it is 
comparatively better to misclassify a normal image than 
the other way around. Hence Recall plays an important 
role in such cases. Out of 1222 images, 977 were used 
for training and 245 for testing. After classification, 827 
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normal brain images were detected, and 395 were 
detected with tumors. 


Multi-class classification 
As the next step, we perform multi-class classification 
using SVM-RBF, Decision tree classifier and a 6-layer 
CNN to classify images into one of four categories — (i) 
No tumor (0), (ii) Pituitary adenoma (1), (ii) 
Meningioma (3), & (iv) Glioma (4). The input set of 2870 
MRI images is split into two subsets in the ratio 80:20 for 
training, and testing, respectively. 2296 images were used 
for the training dataset and 574 for validation. After 
classification, we obtained 395 with no tumor, 827 with a 
pituitary tumor, 822 with meningioma and 826 with 
glioma tumor images. 
Convolution 


Max-Pooling Convolution 


ni channels 


ni channels 


Input 


Max-Pooling 


connected layer equals the number of classes the model 
predicts. 

We use a 6-layer CNN model with five convolution 
layers with different kernel sizes and one fully connected 
layer. The first layer uses a kernel size of 5 x 5, while the 
second and third layers use a kernel size of 3 x 3. The 
fourth and fifth layers employ a 2 x 2 size kernel. 
Activation function ReLU is used across all five 
convolution layers. Stride is the difference between two 
successive kernel positions. Stride has been kept at 1, as 
it is the most common choice for stride, and padding is 
kept at zero in the experiments. The main function of the 
Pooling layer is to cut down on the number of learnable 
parameters in order to improve model performance, 
increase computation speed, reduce memory, and avoid 


Flatten Fully Connected 


n2 channels n2 channels 


n3 units 


Figure 3. Proposed CNN model. 


The CNN model has been used widely in image 
classification problems. A CNN _ uses convolution, 
pooling and fully connected layers. Convolution and 
pooling layers are used for feature extraction, where the 
image is resized, rescaled, and denoised. In _ the 
convolution layer a linear operation is applied to extract 
features using a kernel as a small array of weights. The 
kernel is applied all over the input to obtain an activation 
map. This activation map is taken as input by the Pooling 
layer, which further samples the dataset to reduce 
speed up After 
repetitive sequences of convolution and pooling, the 
Flattening layer converts the dataset into a long 


dimensionality and computation. 


continuous vector. This vector contains high-level image 
features which are used by the Fully-connected layer for 
classifying the image into various classes based on the 
training data. The number of hidden units in the Fully 
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overfitting. In our experiments, we employ Max pooling 
using a 3 x 3 kernel, zero padding and stride = 1. The 
Fully connected layer, also known as dense layer, has 
one-to-one connections between the layers and is used for 
output. Every input is connected to every output by some 
weight in this layer. The final task of classification of 
images is performed here. Once the convolution layer 
extracts all significant features and the pooling layer 
samples the dataset, the fully connected layer is used for 
the final classification output. A ReLU function follows 
every fully connected layer. Softmax activation function 
is used for multi-class classification. This function gives 
output values ranging from 0 to 1, a sum of all values 
being 1. 

We import Tensor flow and Keras to build the CNN 
Model. The input consists of 2870 MRI images. Each 
image was divided by 255 to normalise, resale, and resize 
to 128 x 128. 
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Figure 4. Comparison of classifiers using confusion matrix. 


Results and discussion 

The results for binary and multi-class classification 
are discussed separately for the sake of clarity. We use 
the Confusion Matrix, Accuracy, Precision, Recall, and 
F1 score to evaluate classifiers’ performance. 


Binary Classification 

Four supervised machine learning classifiers — SVM, 
Decision tree, Linear Regression, Naive Bayes, have been 
used for the binary classification of images in the dataset. 
Confusion matrices show classifier performance across 
different classes. Confusion matrices of each of these 
classifiers are depicted in Figure 4. Other performance 
metrics, such as accuracy, precision, recall and F1 score, 
are tabulated in Table 1. 

Fl-score is a harmonic mean of Precision and Recall. 
As a result of our experiments, SVM and Decision tree 
were found to yield highest Fl-Score of 97% and 
accuracy of 96%, however, SVM outperforms in terms of 
Recall. In medical cases, Recall is an important metric. 
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Hence, we can conclude that SVM is the most suitable 
classifier in this case. 
Table 1. Comparison of classifier performance. 


peta Maree oe A NT bie qt ®eld 
Classifiers Accuracy Precision Recall oan 


SVM 0.96 0.96 0.98 0.97 
DT 0.96 0.98 0.96 | 0.97 
LR 0.95 0.93 0.99 | 0.96 
NB 0.88 0.89 0.95 | 0.92 


Multi-class classification 

We employed SVM, Decision tree and a 6-layer CNN 
to classify tumours in the Pituitary, Meningioma and 
Glioma categories. Performance of the classifiers was 
compared using metrics accuracy, precision, recall and 
Fl score. These results are tabulated in Table 2. 
Confusion matrix showing class-wise accuracy, as 
depicted in Figure 6. 

The parameters in CNN were checked with 30 epochs 
in training and testing. The plot diagram of Figure 5 
depicts the loss and accuracy of data with respect to 
epochs. 
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Figure 6. Multi-class classification confusion matrices. 


Table 2. Performance metrics for multi-class 
classification. 


Classifiers Accuracy Precision Recall 


SVM 0.83 0.85 0.72 0.83 

DTC 0.80 0.80 0.80 | 0.80 

CNN 0.93 | 0.94 0.92 | 0.93 
Conclusions 


Accurate and early detection and classification of 
brain tumors from MRI scans is imperative for effective 
and timely treatment. Manual techniques take a lot of 
time due to the large amount of data involved. Also, the 
results are non-reproducible as they depend on the 
radiologist’s expertise. In this study, we make use of 
brain MRI scans in the BraTS 2020 dataset and classify 
images as normal brains and brains containing tumorous 
growth. Supervised machine learning classifiers - 
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Machine, Decision Tree, Linear 
Regression and Naive Bayes have been used for binary 


Support Vector 


classification. As a result of our binary classification 
experiment, we found that SVM outperforms the other 
classifiers with an accuracy of 96% and Recall rate of 
98%. Thereafter, we further categorize images into four 
classes - (i) No tumor, (ii) Pituitary adenoma, (iii) 
Meningioma, & (iv) Glioma. We evaluate and compare 
the performance of SVM-RBF, Decision tree and CNN 
for multi-class classification. We use a 6-layer CNN 
model with 5 convolution layers with different filter 
sizes, ReLU activation function, and one fully connected 
layer. As a result of our experiments, CNN was found to 
outperform the other classifiers with an accuracy of 93% 
and a Recall rate of 92%. 

This work can be extended to determine the grade and 
size of a tumor in tumorous images. This can help 
surgeons and oncologists to get an accurate estimate of 
the type of tumor and prescribe the best course of 
treatment for the patient. Image segmentation algorithms 
can be applied to the images showing tumorous growth. 
This can help estimate the area and volume of the 
tumorous region, which will aid the oncologist to perform 
surgery accurately and precisely. The accuracy and 
timeliness of analysis benefit both the patient and the 
doctor to effect a timely cure or treatment. 
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